Clustering based on Dirichlet mixtures of attribute ensembles

نویسنده

  • Peter D. Hoff
چکیده

We propose a model-based approach to identifying clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of the population, called an attribute ensemble, may depend on the cluster being considered. The model is based on a Pólya urn cluster model, which is equivalent to a Dirichlet process mixture of multivariate normal distributions. This model-based approach allows for the incorporation of applicationspecific data features into the clustering scheme. For example, in an analysis of genetic CGH array data we account for spatial correlation of genetic abnormalities along the genome. Some key words: nonparametric Bayes, unsupervised learning, subspace clustering, variable selection, COSA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering based on Dirichlet mixtures of attribute subsets

We discuss a model-based approach to identifying clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The method is based on a Pólya urn cluster model for multivariate means and variances, resulting in a multivariate Dirichlet process mixture model. This particular model-...

متن کامل

Density Modeling and Clustering Using Dirichlet Diffusion Trees

I introduce a family of prior distributions over multivariate distributions, based on the use of a “Dirichlet diffusion tree” to generate exchangeable data sets. These priors can be viewed as generalizations of Dirichlet processes and of Dirichlet process mixtures, but unlike simple mixtures, they can capture the hierarchical structure present in many distributions, by means of the latent diffu...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

Online clustering via finite mixtures of Dirichlet and minimum message length

This paper presents an online algorithm for mixture model-based clustering. Mixture modeling is the problem of identifying and modeling components in a given set of data. The online algorithm is based on unsupervised learning of finite Dirichlet mixtures and a stochastic approach for estimates updating. For the selection of the number of clusters, we use the minimum message length (MML) approac...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004